cache lookup failed for index

cache lookup failed for index

am 30.06.2010 04:21:25 von Nathan Robertson

--001636c5c209be7243048a3602f1
Content-Type: text/plain; charset=ISO-8859-1

Hi Everyone,

So I have a small web cluster that I'm running PSQL on. today, Apache failed
which caused PSQL to fail which at some point caused some data in the
database to become corrupt. So, now when I try to connect to the data in the
database in question I get: FATAL: cache lookup failed for index 2662. I
get this when I try to connect on the standalone console or when I try to
connect to the running server. So, this is going to sound terrible but there
is no current backup of this database. As the admin this is really
embarrassing and I know I messed up.

That said, could someone point me in the right direction for restoring this?
I've tried to reindex which doesn't work because I can't even connect to the
database in question. If I try to connect through the standalone console to
the system and reindex the entire system the same error message pops up in
the list.

What information can I provide that would help?

CentOS 5.5
PostgreSQL 8.1.21

This is the information from postgresql-Tue.log:
LOG: database system was interrupted at 2010-06-29 17:08:49 EDT
LOG: checkpoint record is at 0/3788A208
LOG: redo record is at 0/3788A208; undo record is at 0/0; shutdown TRUE
LOG: next transaction ID: 610809; next OID: 28328
LOG: next MultiXactId: 8; next MultiXactOffset: 15
LOG: database system was not properly shut down; automatic recovery in
progress
LOG: record with zero length at 0/3788A258
LOG: redo is not required
LOG: database system is ready
LOG: transaction ID wrap limit is 2147484146, limited by database
"postgres"
LOG: database system was interrupted at 2010-06-29 19:52:08 EDT
LOG: checkpoint record is at 0/3788A258
LOG: redo record is at 0/3788A258; undo record is at 0/0; shutdown TRUE
LOG: next transaction ID: 610809; next OID: 28328
LOG: next MultiXactId: 8; next MultiXactOffset: 15
LOG: database system was not properly shut down; automatic recovery in
progress
LOG: record with zero length at 0/3788A2A8
LOG: redo is not required
LOG: database system is ready
LOG: transaction ID wrap limit is 2147484146, limited by database
"postgres"
FATAL: cache lookup failed for index 2662

Any help in this matter would be GREATLY appreciated.

Nate Robertson

--001636c5c209be7243048a3602f1
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Hi Everyone,

So I have a small web cluster that I'm running PSQL=
on. today, Apache failed which caused PSQL to fail which at some point cau=
sed some data in the database to become corrupt. So, now when I try to conn=
ect to the data in the database in question I get: FATAL:=A0 cache lookup f=
ailed for index 2662. I get this when I try to connect on the standalone co=
nsole or when I try to connect to the running server. So, this is going to =
sound terrible but there is no current backup of this database. As the admi=
n this is really embarrassing and I know I messed up.


That said, could someone point me in the right direction for restoring =
this? I've tried to reindex which doesn't work because I can't =
even connect to the database in question. If I try to connect through the s=
tandalone console to the system and reindex the entire system the same erro=
r message pops up in the list.


What information can I provide that would help?

CentOS 5.5
Po=
stgreSQL 8.1.21

This is the information from postgresql-Tue.log:
=
LOG:=A0 database system was interrupted at 2010-06-29 17:08:49 EDT
LOG:=
=A0 checkpoint record is at 0/3788A208

LOG:=A0 redo record is at 0/3788A208; undo record is at 0/0; shutdown TRUE<=
br>LOG:=A0 next transaction ID: 610809; next OID: 28328
LOG:=A0 next Mul=
tiXactId: 8; next MultiXactOffset: 15
LOG:=A0 database system was not pr=
operly shut down; automatic recovery in progress

LOG:=A0 record with zero length at 0/3788A258
LOG:=A0 redo is not requir=
ed
LOG:=A0 database system is ready
LOG:=A0 transaction ID wrap limit=
is 2147484146, limited by database "postgres"
LOG:=A0 databas=
e system was interrupted at 2010-06-29 19:52:08 EDT

LOG:=A0 checkpoint record is at 0/3788A258
LOG:=A0 redo record is at 0/3=
788A258; undo record is at 0/0; shutdown TRUE
LOG:=A0 next transaction I=
D: 610809; next OID: 28328
LOG:=A0 next MultiXactId: 8; next MultiXactOf=
fset: 15

LOG:=A0 database system was not properly shut down; automatic recovery in p=
rogress
LOG:=A0 record with zero length at 0/3788A2A8
LOG:=A0 redo is=
not required
LOG:=A0 database system is ready
LOG:=A0 transaction ID=
wrap limit is 2147484146, limited by database "postgres"

FATAL:=A0 cache lookup failed for index 2662

Any help in this matter=
would be GREATLY appreciated.

Nate Robertson


--001636c5c209be7243048a3602f1--

Re: cache lookup failed for index

am 30.06.2010 15:36:20 von Kevin Grittner

Nathan Robertson wrote:

> Apache failed which caused PSQL to fail which at some point caused
> some data in the database to become corrupt.

What? What does Apache have to do with psql (a command-line client
for PostgreSQL), and how would either of those failing cause a
database corruption? Could you give more details? Did the OS lock
up at some point? Was there any power loss while PostgreSQL was
running?

Without an event like that, database corruption is often the result
of hardware problems. Testing your RAM and making sure you have
SMART monitoring working on the drives, with due attention to any
reported problems, might be a good idea; otherwise, whatever you
recover may be further damaged.

Also, it would help a lot to know what your postgresql.conf file
contains (excluding all comments).

But first and foremost, you should make a file-copy backup of your
entire PostgreSQL data directory tree with the PostgreSQL server
stopped, if you haven't done that already. Any attempt at recovery
may misfire, and you might want to get back to what you have now.

-Kevin

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: cache lookup failed for index

am 30.06.2010 15:42:56 von Nathan Robertson

--001485f9ab0e003982048a3f88cf
Content-Type: text/plain; charset=ISO-8859-1

There was a cascade effect. Apache failed which caused the server overall to
fail. The data is stored on an iSCSI drive and the mount of the iSCSI drive
became corrupt when everything failed. I was able to remount the drive and
get access to data now I have this index error.

So, this is where I'm at. If anyone could help resolve the index cache error
I would be eternally great full.

On Wed, Jun 30, 2010 at 9:36 AM, Kevin Grittner > wrote:

> Nathan Robertson wrote:
>
> > Apache failed which caused PSQL to fail which at some point caused
> > some data in the database to become corrupt.
>
> What? What does Apache have to do with psql (a command-line client
> for PostgreSQL), and how would either of those failing cause a
> database corruption? Could you give more details? Did the OS lock
> up at some point? Was there any power loss while PostgreSQL was
> running?
>
> Without an event like that, database corruption is often the result
> of hardware problems. Testing your RAM and making sure you have
> SMART monitoring working on the drives, with due attention to any
> reported problems, might be a good idea; otherwise, whatever you
> recover may be further damaged.
>
> Also, it would help a lot to know what your postgresql.conf file
> contains (excluding all comments).
>
> But first and foremost, you should make a file-copy backup of your
> entire PostgreSQL data directory tree with the PostgreSQL server
> stopped, if you haven't done that already. Any attempt at recovery
> may misfire, and you might want to get back to what you have now.
>
> -Kevin
>

--001485f9ab0e003982048a3f88cf
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

There was a cascade effect. Apache failed which caused the server overall t=
o fail. The data is stored on an iSCSI drive and the mount of the iSCSI dri=
ve became corrupt when everything failed. I was able to remount the drive a=
nd get access to data now I have this index error.


So, this is where I'm at. If anyone could help resolve the index ca=
che error I would be eternally great full.

e">On Wed, Jun 30, 2010 at 9:36 AM, Kevin Grittner < href=3D"mailto:Kevin.Grittner@wicourts.gov">Kevin.Grittner@w icourts.gov >> wrote:

r-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
>Nathan Robertson <nathan.=
robertson@gmail.com
> wrote:




> Apache failed which caused PSQL to fail which at some point caused

> some data in the database to become corrupt.



What? =A0What does Apache have to do with psql (a command-line client=


for PostgreSQL), and how would either of those failing cause a

database corruption? =A0Could you give more details? =A0Did the OS lock

up at some point? =A0Was there any power loss while PostgreSQL was

running?



Without an event like that, database corruption is often the result

of hardware problems. =A0Testing your RAM and making sure you have

SMART monitoring working on the drives, with due attention to any

reported problems, might be a good idea; otherwise, whatever you

recover may be further damaged.



Also, it would help a lot to know what your postgresql.conf file

contains (excluding all comments).



But first and foremost, you should make a file-copy backup of your

entire PostgreSQL data directory tree with the PostgreSQL server

stopped, if you haven't done that already. =A0Any attempt at recovery r>
may misfire, and you might want to get back to what you have now.



-Kevin




--001485f9ab0e003982048a3f88cf--

Re: cache lookup failed for index

am 30.06.2010 16:14:23 von Kevin Grittner

Nathan Robertson wrote:

> There was a cascade effect. Apache failed which caused the server
> overall to fail. The data is stored on an iSCSI drive and the
> mount of the iSCSI drive became corrupt when everything failed. I
> was able to remount the drive and get access to data now I have
> this index error.

Now we're getting somewhere. The disk drive "became corrupt" while
PostgreSQL was running? Was the drive unmounted or remounted while
PostgreSQL was running, or did you stop PostgreSQL first? Do you
have any errors in the PostgreSQL log from the time this was all
going on?

Also, how confident are you that the Apache failure caused the drive
to be corrupted? That sounds *much* less likely than the other way
around. Without understanding that better, fixing one particular
problem in the database on this machine might be like rearranging
deck chairs on a sinking ship.

> So, this is where I'm at. If anyone could help resolve the index
> cache error I would be eternally great full.

We'd like to help, and perhaps someone else can suggest something on
the basis of information you've provided so far, but I'm not
comfortable suggesting something without a little more of a sense of
what happened and what your configuration is.

>> Also, it would help a lot to know what your postgresql.conf file
>> contains (excluding all comments).

This would still be useful.

>> But first and foremost, you should make a file-copy backup of
>> your entire PostgreSQL data directory tree with the PostgreSQL
>> server stopped, if you haven't done that already. Any attempt at
>> recovery may misfire, and you might want to get back to what you
>> have now.

I can't, in good conscience, recommend any recovery attempts until
you confirm that you have a copy to restore if the cleanup effort
misfires.

One more question occurs to me -- it seems unusual for someone to be
running on a single disk with no RAID and no backup, but to be
running with a version of PostgreSQL with is only about a month old.
Was 8.1.21 the version you were running at the time of the failure,
or have you upgraded during the recovery attempt? If you've
upgraded, the version in use when the corruption occurred could be
relevant.

-Kevin

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: cache lookup failed for index

am 30.06.2010 16:36:14 von Nathan Robertson

--001636c5c1959dfc04048a40468c
Content-Type: text/plain; charset=ISO-8859-1

Hi Kevin,

Thanks for the response.

Now we're getting somewhere. The disk drive "became corrupt" while
PostgreSQL was running? Was the drive unmounted or remounted while
PostgreSQL was running, or did you stop PostgreSQL first? Do you
have any errors in the PostgreSQL log from the time this was all
going on?

The failure basically happened because the Django webapp we're running isn't
effectively closing database connections. So, memory is completely filling
up and causing the server to hang. Yesterday, when this happened it caused
the entire network interface to become inoperable which meant that the iscsi
connection to the shared drive stopped working and data became corrupt.

I stopped the postgresql service before unmounting and remounting the
target.

My first concern is restoring the database. I'll fix the problems with
django and apache later. I can deal with those problems. I'm also going to
create a series of database backups that can be used to quickly restore data
if this happens again. My concern is simply just getting this back to
baseline.

One more question occurs to me -- it seems unusual for someone to be
running on a single disk with no RAID and no backup, but to be
running with a version of PostgreSQL with is only about a month old.
Was 8.1.21 the version you were running at the time of the failure,
or have you upgraded during the recovery attempt? If you've
upgraded, the version in use when the corruption occur

This storage server has RAID and there are backups, it just so happens that
the most recent usable backup is from June 20th. I completely forgot to
configure the backups on this server. I normally wouldn't make this mistake,
but I did this time.

On the version, this is the version that comes standard with CentOS 5.5.
This was a clean CentOS 5.5 install and it's been live for about a month.

>> Also, it would help a lot to know what your postgresql.conf file
>> contains (excluding all comments).

The only uncommented lines are:
max_connections = 500
shared_buffers = 4000
redirect_stderr = on
log_directory = 'pg_log'
log_filename = 'postgresql-%a.log'
log_truncate_on_rotation = on
log_rotation_age = 1440
log_rotation_size = 0
redirect_stderr = on
lc_monetary = 'en_US.UTF-8'
lc_numeric = 'en_US.UTF-8'
lc_time = 'en_US.UTF-8'

I can't, in good conscience, recommend any recovery attempts until
you confirm that you have a copy to restore if the cleanup effort
misfires.

I have a full backup of the entire directory structure I took shortly after
the database became unusable.

On Wed, Jun 30, 2010 at 10:14 AM, Kevin Grittner <
Kevin.Grittner@wicourts.gov> wrote:

> Nathan Robertson wrote:
>
> > There was a cascade effect. Apache failed which caused the server
> > overall to fail. The data is stored on an iSCSI drive and the
> > mount of the iSCSI drive became corrupt when everything failed. I
> > was able to remount the drive and get access to data now I have
> > this index error.
>
> Now we're getting somewhere. The disk drive "became corrupt" while
> PostgreSQL was running? Was the drive unmounted or remounted while
> PostgreSQL was running, or did you stop PostgreSQL first? Do you
> have any errors in the PostgreSQL log from the time this was all
> going on?
>
> Also, how confident are you that the Apache failure caused the drive
> to be corrupted? That sounds *much* less likely than the other way
> around. Without understanding that better, fixing one particular
> problem in the database on this machine might be like rearranging
> deck chairs on a sinking ship.
>
> > So, this is where I'm at. If anyone could help resolve the index
> > cache error I would be eternally great full.
>
> We'd like to help, and perhaps someone else can suggest something on
> the basis of information you've provided so far, but I'm not
> comfortable suggesting something without a little more of a sense of
> what happened and what your configuration is.
>
> >> Also, it would help a lot to know what your postgresql.conf file
> >> contains (excluding all comments).
>
> This would still be useful.
>
> >> But first and foremost, you should make a file-copy backup of
> >> your entire PostgreSQL data directory tree with the PostgreSQL
> >> server stopped, if you haven't done that already. Any attempt at
> >> recovery may misfire, and you might want to get back to what you
> >> have now.
>
> I can't, in good conscience, recommend any recovery attempts until
> you confirm that you have a copy to restore if the cleanup effort
> misfires.
> red could be
> relevant.
> One more question occurs to me -- it seems unusual for someone to be
> running on a single disk with no RAID and no backup, but to be
> running with a version of PostgreSQL with is only about a month old.
> Was 8.1.21 the version you were running at the time of the failure,
> or have you upgraded during the recovery attempt? If you've
> upgraded, the version in use when the corruption occur
>
> -Kevin
>

--001636c5c1959dfc04048a40468c
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Hi Kevin,

Thanks for the response.

Now we're getting some=
where. =A0The disk drive "became corrupt" while

PostgreSQL was running? =A0Was the drive unmounted or remounted while

PostgreSQL was running, or did you stop PostgreSQL first? =A0Do you

have any errors in the PostgreSQL log from the time this was all

going on?


The failure basically happened because the Django webapp we're runn=
ing isn't effectively closing database connections. So, memory is compl=
etely filling up and causing the server to hang. Yesterday, when this happe=
ned it caused the entire network interface to become inoperable which meant=
that the iscsi connection to the shared drive stopped working and data bec=
ame corrupt.


I stopped the postgresql service before unmounting and remounting the t=
arget.

My first concern is restoring the =
database. I'll fix the problems with django and apache later. I can dea=
l with those problems. I'm also going to create a series of database ba=
ckups that can be used to quickly restore data if this happens again. My co=
ncern is simply just getting this back to baseline.




One more question occurs to me -- it seems unusual for someone to be


running on a single disk with no RAID and no backup, but to be


running with a version of PostgreSQL with is only about a month old.


Was 8.1.21 the version you were running at the time of the failure,


or have you upgraded during the recovery attempt? =A0If you've


upgraded, the version in use when the corruption occur

This storage =
server has RAID and there are backups, it just so happens that the most rec=
ent usable backup is from June 20th. I completely forgot to configure the b=
ackups on this server. I normally wouldn't make this mistake, but I did=
this time.


On the version, this is the version that comes standard with CentOS 5.5=
.. This was a clean CentOS 5.5 install and it's been live for about a mo=
nth.

>> Also, it would help a lot to know what your postgresql=
..conf=20
file

>> contains (excluding all comments).


The only uncommented lines are:

max_connections =3D 500
shared_buffers =3D 4000
redirect_stderr =3D o=
n
log_directory =3D 'pg_log'            =
  =A0
log_filename =3D 'postgresql-%a.log'
log_truncate=
_on_rotation =3D on
log_rotation_age =3D 1440

log_rotation_size =3D 0
redirect_stderr =3D on
lc_monetary =3D 'e=
n_US.UTF-8'
lc_numeric =3D 'en_US.UTF-8'
lc_time =3D =
9;en_US.UTF-8'

I can't, in good conscience, recommend any re=
covery attempts until


you confirm that you have a copy to restore if the cleanup effort

misfires.


I have a full backup of the entire directory structure I took shortly a=
fter the database became unusable.

On Wed, Jun 30, 2010 at 10:14 AM,=
Kevin Grittner < urts.gov">Kevin.Grittner@wicourts.gov> wrote:

r-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
>Nathan Robertson <nathan.=
robertson@gmail.com
> wrote:




> There was a cascade effect. Apache failed whic=
h caused the server

> overall to fail. The data is stored on an iSCSI drive and the

> mount of the iSCSI drive became corrupt when everything failed. I

> was able to remount the drive and get access to data now I have

> this index error.



Now we're getting somewhere. =A0The disk drive "became corru=
pt" while

PostgreSQL was running? =A0Was the drive unmounted or remounted while

PostgreSQL was running, or did you stop PostgreSQL first? =A0Do you

have any errors in the PostgreSQL log from the time this was all

going on?



Also, how confident are you that the Apache failure caused the drive

to be corrupted? =A0That sounds *much* less likely than the other way

around. =A0Without understanding that better, fixing one particular

problem in the database on this machine might be like rearranging

deck chairs on a sinking ship.



> So, this is where I'm at. If anyone could help resolve the index r>
> cache error I would be eternally great full.



We'd like to help, and perhaps someone else can suggest something=
on

the basis of information you've provided so far, but I'm not

comfortable suggesting something without a little more of a sense of

what happened and what your configuration is.



>> Also, it would help a lot to know what your postgresql.conf file r>
>> contains (excluding all comments).



This would still be useful.



>> But first and foremost, you should make a file-copy backup of

>> your entire PostgreSQL data directory tree with the PostgreSQL

>> server stopped, if you haven't done that already. =A0Any attem=
pt at

>> recovery may misfire, and you might want to get back to what you r>
>> have now.



I can't, in good conscience, recommend any recovery attempts unti=
l

you confirm that you have a copy to restore if the cleanup effort

misfires.

red could be

relevant.

One more question occurs to me -- it seems unusual for someone to be

running on a single disk with no RAID and no backup, but to be

running with a version of PostgreSQL with is only about a month old.

Was 8.1.21 the version you were running at the time of the failure,

or have you upgraded during the recovery attempt? =A0If you've

upgraded, the version in use when the corruption occur



-Kevin




--001636c5c1959dfc04048a40468c--

Re: cache lookup failed for index

am 30.06.2010 17:51:12 von Kevin Grittner

Nathan Robertson wrote:

> The failure basically happened because the Django webapp we're
> running isn't effectively closing database connections. So, memory
> is completely filling up and causing the server to hang.
> Yesterday, when this happened it caused the entire network
> interface to become inoperable which meant that the iscsi
> connection to the shared drive stopped working and data became
> corrupt.
>
> I stopped the postgresql service before unmounting and remounting
> the target.

OK, I think the appropriate next step would be to try to run the
PostgreSQL cluster in single-user mode:

http://www.postgresql.org/docs/8.1/interactive/app-postgres. html

Try to REINDEX pg_class_oid_index in that mode. If that fails, it
might possibly help to run these statements and try the REINDEX
command again:

set enable_indexscan = off;
set enable_bitmapscan = off;

I hope this helps.

-Kevin

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: cache lookup failed for index

am 30.06.2010 18:04:06 von Nathan Robertson

--001636c5c011dcfe1f048a418085
Content-Type: text/plain; charset=ISO-8859-1

Hi Kevin,

Thanks for this. I've found a lot of information on this online but I'm a
little unclear about how exactly I should connect and run the reindex.

My thinking based on the documentation is I run (as postgres user):
postgres -O -P -D /dbcluster/location

Then I run:
REINDEX TABLE pg_class_oid_in;

Is this correct?

Nate


On Wed, Jun 30, 2010 at 11:51 AM, Kevin Grittner <
Kevin.Grittner@wicourts.gov> wrote:

> Nathan Robertson wrote:
>
> > The failure basically happened because the Django webapp we're
> > running isn't effectively closing database connections. So, memory
> > is completely filling up and causing the server to hang.
> > Yesterday, when this happened it caused the entire network
> > interface to become inoperable which meant that the iscsi
> > connection to the shared drive stopped working and data became
> > corrupt.
> >
> > I stopped the postgresql service before unmounting and remounting
> > the target.
>
> OK, I think the appropriate next step would be to try to run the
> PostgreSQL cluster in single-user mode:
>
> http://www.postgresql.org/docs/8.1/interactive/app-postgres. html
>
> Try to REINDEX pg_class_oid_index in that mode. If that fails, it
> might possibly help to run these statements and try the REINDEX
> command again:
>
> set enable_indexscan = off;
> set enable_bitmapscan = off;
>
> I hope this helps.
>
> -Kevin
>

--001636c5c011dcfe1f048a418085
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Hi Kevin,

Thanks for this. I've found a lot of information on th=
is online but I'm a little unclear about how exactly I should connect a=
nd run the reindex.

My thinking based on the documentation is I run =
(as postgres user):

postgres -O -P -D /dbcluster/location

Then I run:
REINDEX TABLE p=
g_class_oid_in;

Is this correct?

Nate


=3D"gmail_quote">On Wed, Jun 30, 2010 at 11:51 AM, Kevin Grittner =3D"ltr"><Kevin.Grittner@=
wicourts.gov
>
wrote:

r-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
>Nathan Robertson <nathan.=
robertson@gmail.com
> wrote:




> The failure basically happened because the Dja=
ngo webapp we're

> running isn't effectively closing database connections. So, memory=


> is completely filling up and causing the server to hang.

> Yesterday, when this happened it caused the entire network

> interface to become inoperable which meant that the iscsi

> connection to the shared drive stopped working and data became

> corrupt.

>

> I stopped the postgresql service before unmounting and remounting

> the target.



OK, I think the appropriate next step would be to try to run the

PostgreSQL cluster in single-user mode:



" target=3D"_blank">http://www.postgresql.org/docs/8.1/interac tive/app-post=
gres.html




Try to REINDEX pg_class_oid_index in that mode. =A0If that fails, it

might possibly help to run these statements and try the REINDEX

command again:



set enable_indexscan =3D off;

set enable_bitmapscan =3D off;



I hope this helps.



-Kevin




--001636c5c011dcfe1f048a418085--

Re: cache lookup failed for index

am 30.06.2010 18:13:21 von Kevin Grittner

Nathan Robertson wrote:

> My thinking based on the documentation is I run (as postgres
> user):
> postgres -O -P -D /dbcluster/location

Looks good to me. In fact, I hadn't remembered the -P option;
definitely a good choice here, and it should obviate the need to try
to disable the index usage using the commands I showed.

> Then I run:
> REINDEX TABLE pg_class_oid_in;

You either need to specify the INDEX keyword or a table name.
Perhaps this would be best:

REINDEX TABLE pg_class;

-Kevin

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: cache lookup failed for index

am 30.06.2010 18:14:23 von Tom Lane

"Kevin Grittner" writes:
> OK, I think the appropriate next step would be to try to run the
> PostgreSQL cluster in single-user mode:

> http://www.postgresql.org/docs/8.1/interactive/app-postgres. html

> Try to REINDEX pg_class_oid_index in that mode. If that fails, it
> might possibly help to run these statements and try the REINDEX
> command again:

> set enable_indexscan = off;
> set enable_bitmapscan = off;

Those won't help. What you *will* need, in order to even start the
single-user backend, is to tell it to disregard system indexes
(-P command line option).

I wouldn't be too surprised if the corruption extends a lot further than
the one index :-( but maybe you will be able to extract something after
reindexing.

regards, tom lane

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: cache lookup failed for index

am 30.06.2010 18:19:18 von Nathan Robertson

--001636c9286235a89c048a41b7f5
Content-Type: text/plain; charset=ISO-8859-1

Thanks Tom.

OK, I ran:
postgres -O -P -D /cluster/location

reindex table pg_class;
backend> reindex table pg_class;

And then nothing returns. Nothing stating whether it was successful or a
failure.

And then if I do:
bash-3.2$ postgres -O -P -D /shared/webapp/database webapp

I still get:
FATAL: cache lookup failed for index 2662


On Wed, Jun 30, 2010 at 12:14 PM, Tom Lane wrote:

> "Kevin Grittner" writes:
> > OK, I think the appropriate next step would be to try to run the
> > PostgreSQL cluster in single-user mode:
>
> > http://www.postgresql.org/docs/8.1/interactive/app-postgres. html
>
> > Try to REINDEX pg_class_oid_index in that mode. If that fails, it
> > might possibly help to run these statements and try the REINDEX
> > command again:
>
> > set enable_indexscan = off;
> > set enable_bitmapscan = off;
>
> Those won't help. What you *will* need, in order to even start the
> single-user backend, is to tell it to disregard system indexes
> (-P command line option).
>
> I wouldn't be too surprised if the corruption extends a lot further than
> the one index :-( but maybe you will be able to extract something after
> reindexing.
>
> regards, tom lane
>

--001636c9286235a89c048a41b7f5
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Thanks Tom.

OK, I ran:
=A0postgres -O -P -D /cluster/location
=

reindex table pg_class;
backend> reindex table pg_class;

A=
nd then nothing returns. Nothing stating whether it was successful or a fai=
lure.


And then if I do:
bash-3.2$ postgres -O -P -D /shared/webapp/databas=
e webapp

I still get:
FATAL:=A0 cache lookup failed for index 266=
2


On Wed, Jun 30, 2010 at 12:14 PM, T=
om Lane <tgl@sss.=
pgh.pa.us
>
wrote:

r-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
>"Kevin Grittner" < ov">Kevin.Grittner@wicourts.gov> writes:


> OK, I think the appropriate next step would be to try to run the

> PostgreSQL cluster in single-user mode:



> ..html" target=3D"_blank">http://www.postgresql.org/docs/8.1/interac tive/app=
-postgres.html




> Try to REINDEX pg_class_oid_index in that mode. =A0If that fails, it r>
> might possibly help to run these statements and try the REINDEX

> command again:



> set enable_indexscan =3D off;

> set enable_bitmapscan =3D off;



Those won't help. =A0What you *will* need, in order to even start=
the

single-user backend, is to tell it to disregard system indexes

(-P command line option).



I wouldn't be too surprised if the corruption extends a lot further tha=
n

the one index :-( but maybe you will be able to extract something after

reindexing.



=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0regards, tom lane




--001636c9286235a89c048a41b7f5--

Re: cache lookup failed for index

am 30.06.2010 19:04:59 von Tom Lane

Nathan Robertson writes:
> OK, I ran:
> postgres -O -P -D /cluster/location

This probably connected to the postgres database, not webapp which is
where your problem is.

> backend> reindex table pg_class;

> And then nothing returns. Nothing stating whether it was successful or a
> failure.

The standalone mode isn't very verbose. If you got a prompt back
without any error then it's OK. Just type ^D at the prompt to exit.

regards, tom lane

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: cache lookup failed for index

am 30.06.2010 21:18:00 von Nathan Robertson

--001636c595d34df146048a443664
Content-Type: text/plain; charset=ISO-8859-1

*This probably connected to the postgres database, not webapp which is
where your problem is.
*
OK, is there anyway I can force a connection to this database so I can get a
look at the tables (webapp)? To be honest, if there is corrupt data in
there, that is fine. I'll manually pull it out. I just need to be able to
figure out what uncorrupted data is in there so I can determine if it's even
worth saving.

On Wed, Jun 30, 2010 at 1:04 PM, Tom Lane wrote:

> Nathan Robertson writes:
> > OK, I ran:
> > postgres -O -P -D /cluster/location
>
> This probably connected to the postgres database, not webapp which is
> where your problem is.
>
> > backend> reindex table pg_class;
>
> > And then nothing returns. Nothing stating whether it was successful or a
> > failure.
>
> The standalone mode isn't very verbose. If you got a prompt back
> without any error then it's OK. Just type ^D at the prompt to exit.
>
> regards, tom lane
>

--001636c595d34df146048a443664
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

This probably connected to the postgres database, not webapp which is >
where your problem is.


OK, is there anyway I can force a connection to this database so I can =
get a look at the tables (webapp)? To be honest, if there is corrupt data i=
n there, that is fine. I'll manually pull it out. I just need to be abl=
e to figure out what uncorrupted data is in there so I can determine if it&=
#39;s even worth saving.


On Wed, Jun 30, 2010 at 1:04 PM, Tom Lane pan dir=3D"ltr"><tgl@sss.pgh.pa.us<=
/a>> wrote:
0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left=
: 1ex;">
Nathan Robertson < mail.com">nathan.robertson@gmail.com> writes:

> OK, I ran:

> =A0postgres -O -P -D /cluster/location



This probably connected to the postgres database, not webapp which is=


where your problem is.



> backend> reindex table pg_class;



> And then nothing returns. Nothing stating whether it was successful or=
a

> failure.



The standalone mode isn't very verbose. =A0If you got a prompt ba=
ck

without any error then it's OK. =A0Just type ^D at the prompt to exit.<=
br>


=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0regards, tom lane




--001636c595d34df146048a443664--

Re: cache lookup failed for index

am 30.06.2010 21:20:11 von Tom Lane

Nathan Robertson writes:
> *This probably connected to the postgres database, not webapp which is
> where your problem is.
> *
> OK, is there anyway I can force a connection to this database

postgres -O -P -D /cluster/location webapp

(You don't really need the -O, but it probably doesn't hurt either.)

regards, tom lane

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: cache lookup failed for index

am 30.06.2010 22:05:26 von Nathan Robertson

--0016e68ee227f6c8ca048a44df8e
Content-Type: text/plain; charset=ISO-8859-1

OK, as postgres user:
-bash-3.2$ postgres -O -P -D /tmp/database webapp
FATAL: cache lookup failed for index 2662


On Wed, Jun 30, 2010 at 3:20 PM, Tom Lane wrote:

> Nathan Robertson writes:
> > *This probably connected to the postgres database, not webapp which is
> > where your problem is.
> > *
> > OK, is there anyway I can force a connection to this database
>
> postgres -O -P -D /cluster/location webapp
>
> (You don't really need the -O, but it probably doesn't hurt either.)
>
> regards, tom lane
>

--0016e68ee227f6c8ca048a44df8e
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

OK, as postgres user:
-bash-3.2$ postgres -O=A0 -P -D /tmp/database weba=
pp
FATAL:=A0 cache lookup failed for index 2662


"gmail_quote">On Wed, Jun 30, 2010 at 3:20 PM, Tom Lane &=
lt;>
wr=
ote:

r-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
>Nathan Robertson <nathan.=
robertson@gmail.com
> writes:


> *This probably connected to the postgres database, not webapp wh=
ich is

> where your problem is.

> *

> OK, is there anyway I can force a connection to this database



postgres -O -P -D /cluster/location webapp



(You don't really need the -O, but it probably doesn't hurt either.=
)



=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0regards, tom lane




--0016e68ee227f6c8ca048a44df8e--

Re: cache lookup failed for index

am 30.06.2010 22:30:15 von Tom Lane

Nathan Robertson writes:
> OK, as postgres user:
> -bash-3.2$ postgres -O -P -D /tmp/database webapp
> FATAL: cache lookup failed for index 2662

Even with -P? Wow, that's bad. This DB may be just toast I'm afraid.
But try it like this:

postgres -O -P -D /tmp/database -c log_error_verbosity=verbose webapp

This should give you the exact location of the error, which will give us
a bit better clue what's failing.

regards, tom lane

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: cache lookup failed for index

am 30.06.2010 22:59:24 von Tom Lane

Nathan Robertson writes:
> -bash-3.2$ postgres -O -P -D /tmp/database -c log_error_verbosity=verbose
> webapp
> FATAL: XX000: cache lookup failed for index 2662
> LOCATION: RelationInitIndexAccessInfo, relcache.c:841

Um ... and you said this was current 8.1.x ... so it's dying here:

tuple = SearchSysCache(INDEXRELID,
ObjectIdGetDatum(RelationGetRelid(relation)),
0, 0, 0);
if (!HeapTupleIsValid(tuple))
elog(ERROR, "cache lookup failed for index %u",
RelationGetRelid(relation));

I interpret this to mean that it can't find the pg_index row for
pg_class_oid_index --- and since you're using -P, that doesn't just mean
corruption in pg_index's indexes, but that the tuple can't be found even
by seqscanning the whole catalog. Probably the whole page it's in got
wiped out by your filesystem-level failure.

I hate to be the bearer of bad news, but I think this DB may be beyond
recovery. It's very unlikely that there's just the one tuple gone.
If you're willing to throw money at the problem, there are various
people who offer consulting services that include trying to reconstruct
broken Postgres databases; but you might be best advised to just take
your lumps and go back to your last good backup. At this point you're
looking at a significant investment of time with no guarantee of being
able to extract anything very useful.

What I'm taking from this is another horror story about the risks of
mounting databases across networks instead of locally :-(. Postgres
is only as reliable as the storage it's using.

regards, tom lane

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin